Pointwise mutual information

Pointwise mutual information (PMI), or point mutual information, is a measure of association used in information theory and statistics.

1 Definition
2 Similarities to Mutual Information
3 Normalized Pointwise mutual information (npmi)
4 Chain-rule for pmi
5 External links
6 References

Definition

The PMI of a pair of outcomes x and y belonging to discrete random variables X and Y quantifies the discrepancy between the probability of their coincidence given their joint distribution and the probability of their coincidence given only their individual distributions, assuming independence. Mathematically:

$\operatorname{pmi}(x;y) \equiv \log\frac{p(x,y)}{p(x)p(y)} = \log\frac{p(x|y)}{p(x)} = \log\frac{p(y|x)}{p(y)}.$

The mutual information (MI) of the random variables X and Y is the expected value of the PMI over all possible outcomes.

The measure is symmetric ( $pmi(x;y)=pmi(y;x)$ ). It can take positive or negative values, but is zero if X and Y are independent. PMI maximizes when X and Y are perfectly associated, yielding the following bounds:

$-\infty \leq \operatorname{pmi}(x;y) \leq \min\left[ -\log p(x), -\log p(y) \right]$

Finally, $\operatorname{pmi}(x;y)$ will increase if $p(x|y)$ is fixed but $p(x)$ decreases.

Here is an example to illustrate:

x	y	p(x, y)
0	0	0.1
0	1	0.7
1	0	0.15
1	1	0.05

Using this table we can marginalize to get the following additional table for the individual distributions:

	p(x)	p(y)
0	.8	0.25
1	.2	0.75

With this example, we can compute four values for $pmi(x;y)$ . Using base-2 logarithms:

pmi(x=0;y=0)	−1
pmi(x=0;y=1)	0.222392421
pmi(x=1;y=0)	1.584962501
pmi(x=1;y=1)	−1.584962501

(For reference $\operatorname{I}(X;Y)$ would then be 0.214170945)

Similarities to Mutual Information

Pointwise Mutual Information has many of the same relationships as the mutual information. In particular,

$\begin{align} \operatorname{pmi}(x;y) &=& h(x) %2B h(y) - h(xy) \\ &=& h(x) - h(x|y) \\ &=& h(y) - h(y|x) \end{align}$

Where $h(x)$ is the self-information, or $-\log_2 p(X=x)$ .

Normalized Pointwise mutual information (npmi)

Pointwise mutual information can be normalized between [-1,+1] resulting in -1 (in the limit) for never occurring together, 0 for independence, and +1 for complete co-occurrence.

$\operatorname{npmi}(x;y) = \frac{\operatorname{pmi}(x;y)}{-\log \left[ \max ( p(x), p(y) ) \right] }$

Chain-rule for pmi

Pointwise mutual information follows the chain-rule, that is,

$\operatorname{pmi}(x;yz) = \operatorname{pmi}(x;y) %2B \operatorname{pmi}(x;y|z)$

This is easily proven by:

$\begin{align} \operatorname{pmi}(x;y) %2B \operatorname{pmi}(x;y|z) & {} = \log\frac{p(x,y)}{p(x)p(y)} %2B \log\frac{p(x,z|y)}{p(x|y)p(z|y)} \\ & {} = \log \left[ \frac{p(x,y)}{p(x)p(y)} \frac{p(x,z|y)}{p(x|y)p(z|y)} \right] \\ & {} = \log \frac{p(x|y)p(y)p(x,z|y)}{p(x)p(y)p(x|y)p(z|y)} \\ & {} = \log \frac{p(x,yz)}{p(x)p(yz)} \\ & {} = \operatorname{pmi}(x;yz) \end{align}$

External links

Demo at Rensselaer MSR Server (PMI values normalized to be between 0 and 1)

References

Normalized (Pointwise) Mutual Information in Collocation Extraction http://www.ling.uni-potsdam.de/~gerlof/docs/npmi-pfd.pdf
Fano, R M (1961), Transmission of Information: A Statistical Theory of Communications, MIT Press, Cambridge, MA (Chapter 2).